PROJECT 1 - DATA SCIENCE

Authors :

Ariel Epshtein, ID: 316509504

Maor Mohav, ID: 316142363

Omer Ben David, ID: 316344449


Project structure:


  1. The project contains a dictionary that will save the values that will run the model using functions we have written, (in PYCHARM these values will be received as input from the user in the GUI, and in JUPYTER, the values are default, values we have chosen and can be changed)

  2. In each experiment, initial processing steps will be performed.

  3. The project contains models and algorithms that use built-in libraries, but also models and algorithms with our implementation (functions with the extension at the end of the signature: _Implementation)

  4. Before running the project file in PYCHARM, the user will need to read the Readme.text file

  5. We add documentation for all the functions in the Pycharm project.

Libraries:

Uploading data

EDA:

Our analysis:


  1. Our data has 16 columns : 6 - numeric columns ,6 - categorical columns , 4- boolean columns.
  1. Our data has 29 missing values.

  2. Columns corrolations:
    contact is highly correlated with month
    education is highly correlated with job
    job is highly correlated with education
    month is highly correlated with contact and 2 other fields
    day is highly correlated with month
    month is highly correlated with housing
    housing is highly correlated with month

Delete classification missing data

Complete Missing Data

Split the data

Normalization

Discritization

Encoder

Saving files

Model:

KNN

K- means